Due to the intense volatility that is often present in financial time series data, different modeling techniques are often necessary to capture unusual behavior and make informed forecasts. Thus, this page will visualize on all nine financial time series (all stock prices of companies in related industries) to give us a better idea of the economic landscape surrounding people’s transportation choices and the fallout of pandemic-induced phenomena. We will then select one time series per industry as a representative to avoid over-complicating our analysis with stocks that are overly-influenced by factors outside of those that impact public transit. As a reminder, these are the stock prices we will be looking at, from the start of 2020 to present day:
Rideshare Companies:
Uber
Lyft
Telecommunications Companies:
Zoom
Microsoft
Cisco
Oil Companies:
Shell
BP
Exxon
Chevron
Note: Much of the code and process on this page is re-purposed from:
Gamage, P. (2026). Applied Time Series for Data Science.
In this section, we will be taking a look at the returns of each financial time series to comment on both the stationarity and volatility of the data. The financial data for these analyses will all be stock prices for companies in industries related to public transit usage, and returns will be calculated using adjusted closing prices.
plot <-plot_ly(rideshare, x =~Date) %>%add_lines(y =~Uber, name ="Uber Closing Price", line =list(color ='black')) %>%add_lines(y =~Lyft, name ="Lyft Closing Price", line =list(color ='#cc00bb')) %>%layout(title ="Rideshare Stock prices",xaxis =list(title ="Date"),yaxis =list(title ="Price"),legend =list(title =list(text ="Legend")) )plot
Code
uber_returns <-Delt(uber$UBER.Adjusted)uber_returns_df <-data.frame(date = uber$date,return = uber_returns)uber_returns_df <- uber_returns_df[-1,]fig <-plot_ly(uber_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="Uber Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
Code
lyft_returns <-Delt(uber$UBER.Adjusted)lyft_returns_df <-data.frame(date = lyft$date,return = lyft_returns)lyft_returns_df <- lyft_returns_df[-1,]fig <-plot_ly(lyft_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="Lyft Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
These plots show that Uber and Lyft stock prices follow similar patterns until mid-2022 where they diverge greatly. Both have high volatility in 2020 with frequent spikes in 2022 and 2024, showing that these stocks can be susceptible to outside events. Meanwhile neither appears to be stationary. As decided in the EDA section, we will continue analyzing the rideshare industry by fitting models to the Uber stock only.
Code
plot <-plot_ly(telecom, x =~Date) %>%add_lines(y =~Zoom, name ="Zoom Closing Price", line =list(color ='blue')) %>%add_lines(y =~Microsoft, name ="Microsoft Closing Price", line =list(color ='#008800')) %>%add_lines(y =~Cisco, name ="Cisco Closing Price", line =list(color ='#bb0000')) %>%layout(title ="Telecommunications Stock prices",xaxis =list(title ="Date"),yaxis =list(title ="Price"),legend =list(title =list(text ="Legend")) )plot
Code
microsoft_returns <-Delt(microsoft$MSFT.Adjusted)microsoft_returns_df <-data.frame(date = microsoft$date,return = microsoft_returns)microsoft_returns_df <- microsoft_returns_df[-1,]fig <-plot_ly(microsoft_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="Microsoft Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
Code
zoom_returns <-Delt(zoom$ZM.Adjusted)zoom_returns_df <-data.frame(date = zoom$date,return = zoom_returns)zoom_returns_df <- zoom_returns_df[-1,]fig <-plot_ly(zoom_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="Zoom Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
Code
cisco_returns <-Delt(cisco$CSCO.Adjusted)cisco_returns_df <-data.frame(date = cisco$date,return = cisco_returns)cisco_returns_df <- cisco_returns_df[-1,]fig <-plot_ly(cisco_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="Cisco Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
These stocks differ greatly from one another. Cisco appears to be the most stationary with a relatively steady price that is less impacted by events from the last five years, while the others are certainly non-stationary. Meanwhile, there is great volatility all-around in 2020, while each stock has various spikes that appear to not affect one another at the same time. Overall, there is more volatility here than with rideshare stocks. As decided in the EDA section, we will continue analyzing the telecommunications industry by fitting models to the Zoom stock only.
Code
plot <-plot_ly(oil, x =~Date) %>%add_lines(y =~BP, name ="BP Closing Price", line =list(color ='#66cc00')) %>%add_lines(y =~Shell, name ="Shell Closing Price", line =list(color ='#cccc00')) %>%add_lines(y =~Exxon, name ="Exxon Closing Price", line =list(color ='#bb0000')) %>%add_lines(y =~Chevron, name ="Chevron Closing Price", line =list(color ='#0000bb')) %>%layout(title ="Oil Stock prices",xaxis =list(title ="Date"),yaxis =list(title ="Price"),legend =list(title =list(text ="Legend")) )plot
Code
shell_returns <-Delt(shell$SHEL.Adjusted)shell_returns_df <-data.frame(date = shell$date,return = shell_returns)shell_returns_df <- shell_returns_df[-1,]fig <-plot_ly(shell_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="Shell Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
Code
bp_returns <-Delt(bp$BP.Adjusted)bp_returns_df <-data.frame(date = bp$date,return = bp_returns)bp_returns_df <- bp_returns_df[-1,]fig <-plot_ly(bp_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="BP Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
Code
exxon_returns <-Delt(exxon$XOM.Adjusted)exxon_returns_df <-data.frame(date = exxon$date,return = exxon_returns)exxon_returns_df <- exxon_returns_df[-1,]fig <-plot_ly(exxon_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="Exxon Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
Code
chevron_returns <-Delt(chevron$CVX.Adjusted)chevron_returns_df <-data.frame(date = chevron$date,return = chevron_returns)chevron_returns_df <- chevron_returns_df[-1,]fig <-plot_ly(chevron_returns_df, x =~date, y =~Delt.1.arithmetic, type ='scatter', mode ='lines') %>%layout(title ="Chevron Daily Returns",xaxis =list(title ="Date"),yaxis =list(title ="Daily Return (Calculated by Adjusted Price)"))fig
Each of these prices follows a very similar trend for the last five years, which tells us a lot about the nature of the oil industry. It appears to not quite be as volatile as the other industries, although there was certainly erratic movement in 2020. Additionally, each of these time series does not appear to be stationary. As decided in the EDA section, we will continue analyzing the oil industry by fitting models to the Exxon stock only.
Model Fitting
First, we will take a look at the ACF and PACF plots, as well as the ACF plots of the absolute value and squared returns to see what is needed prior to fitting a model.
Based on the absolute value and squares, it appears that there is some autocorrelation at several lags for the returns, even if it is not as visible with the raw data. Therefore, a GARCH model should be used and the following steps will be done to fit both GARCH and ARIMA+GARCH models to see which performs better. \(p\) and \(q\) will be chosen via a grid search.
Code
uber_arch_models <-list()cc <-1for (p in1:7) {for (q in1:7) { uber_arch_models[[cc]] <-garch((uber_returns_df$Delt.1.arithmetic)^2, order =c(q, p), trace =FALSE) cc <- cc +1 }}uber_ARCH_AIC <-sapply(uber_arch_models, AIC)uber_best_index <-which.min(uber_ARCH_AIC)print(uber_arch_models[[uber_best_index]])
Title:
GARCH Modelling
Call:
fGarch::garchFit(formula = ~garch(1, 2), data = (uber_returns_df$Delt.1.arithmetic)^2,
cond.dist = "norm", trace = FALSE)
Mean and Variance Equation:
data ~ garch(1, 2)
<environment: 0x7f8a8fb59e18>
[data = (uber_returns_df$Delt.1.arithmetic)^2]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1 beta1 beta2
6.1623e-04 2.0665e-06 1.0000e+00 3.0219e-01 1.0000e-08
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu 6.162e-04 6.613e-05 9.318 < 2e-16 ***
omega 2.066e-06 5.782e-07 3.574 0.000352 ***
alpha1 1.000e+00 1.826e-01 5.477 4.33e-08 ***
beta1 3.022e-01 3.477e-01 0.869 0.384799
beta2 1.000e-08 2.158e-01 0.000 1.000000
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
6259.749 normalized: 4.667971
Description:
Tue May 6 21:46:39 2025 by user:
Standardised Residuals Tests:
Statistic p-Value
Jarque-Bera Test R Chi^2 2.051973e+05 0.000000000
Shapiro-Wilk Test R W 4.427170e-01 0.000000000
Ljung-Box Test R Q(10) 1.711598e+01 0.071836759
Ljung-Box Test R Q(15) 3.342570e+01 0.004096922
Ljung-Box Test R Q(20) 4.202570e+01 0.002744335
Ljung-Box Test R^2 Q(10) 8.599375e-01 0.999914255
Ljung-Box Test R^2 Q(15) 9.541431e-01 0.999999818
Ljung-Box Test R^2 Q(20) 1.386222e+00 0.999999996
LM Arch Test R TR^2 9.656504e-01 0.999988346
Information Criterion Statistics:
AIC BIC SIC HQIC
-9.328485 -9.309092 -9.328512 -9.321220
All but one of the GARCH(1,2) model coefficients is statistically significant, meaning the conditional variance is impacted by past squared returns. Prior to fitting an ARIMA+GARCH model, we must find optimal ARIMA parameters using auto.arima().
Code
uber_returns <-na.omit(uber_returns)uber_ts <-ts(uber_returns, start=decimal_date(as.Date("2020-01-01")), frequency =252)
Code
ARIMA.c <-function(p_min, p_max, q_min, q_max, data) { uber_results <-matrix(NA, nrow =0, ncol =6)colnames(uber_results) <-c("p", "d", "q", "AIC", "BIC", "AICc")for (p in p_min:p_max) {for (q in q_min:q_max) {for (d in0:2) {if ((p + d + q) <=6) { fit <-Arima(data, order =c(p, d, q)) metrics <-c(p, d, q, fit$aic, fit$bic, fit$aicc) uber_results <-rbind(uber_results, metrics) } } } }# Convert to data frame uber_results_df <-as.data.frame(uber_results)colnames(uber_results_df) <-c("p", "d", "q", "AIC", "BIC", "AICc")return(uber_results_df)}uber_output <-ARIMA.c(0, 7, 0, 7, data = uber_ts)uber_output[which.min(uber_output$AIC), ]
p d q AIC BIC AICc
metrics.48 3 0 1 -5259.465 -5228.258 -5259.402
Code
auto.arima(uber_ts)
Series: uber_ts
ARIMA(0,0,0) with non-zero mean
Coefficients:
mean
0.0013
s.e. 0.0009
sigma^2 = 0.001161: log likelihood = 2629.1
AIC=-5254.2 AICc=-5254.19 BIC=-5243.8
Since this returns a model of ARIMA(0,0,0), we are unlikely to gain much from adding it to this model, so the ARIMA(2,0,3) will be selected. The following results are diagnostics from this model.
Code
uber_arima <-Arima(uber_ts, order =c(2,0,3))uber_arima_res <-residuals(uber_arima)summary(uber_arima)
Series: uber_ts
ARIMA(2,0,3) with non-zero mean
Coefficients:
ar1 ar2 ma1 ma2 ma3 mean
1.2656 -0.3372 -1.2864 0.4374 -0.0987 0.0013
s.e. 0.2121 0.2366 0.2112 0.2391 0.0292 0.0007
sigma^2 = 0.001153: log likelihood = 2636.6
AIC=-5259.2 AICc=-5259.12 BIC=-5222.79
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 3.196353e-05 0.03387395 0.0234167 NaN Inf 0.6908066 0.0006858808
Here we still see some large ACF values in the absolute value and squared data, so we will need to fit models and perform diagnostics to evaluate which model is best.
Code
uber_arima_arch_models <-list()cc <-1for (p in1:7) {for (q in1:7) { uber_arima_arch_models[[cc]] <-garch(uber_arima_res, order =c(q, p), trace =FALSE) cc <- cc +1 }}uber_ARIMA_ARCH_AIC <-sapply(uber_arima_arch_models, AIC)uber_best_arima_index <-which.min(uber_ARIMA_ARCH_AIC)print(uber_arima_arch_models[[uber_best_arima_index]])
Title:
GARCH Modelling
Call:
fGarch::garchFit(formula = ~garch(1, 1), data = uber_arima_res,
cond.dist = "norm", trace = FALSE)
Mean and Variance Equation:
data ~ garch(1, 1)
<environment: 0x7f8a47c0c588>
[data = uber_arima_res]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1 beta1
0.00031964 0.00012767 0.14125731 0.74191292
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu 3.196e-04 7.949e-04 0.402 0.687587
omega 1.277e-04 4.581e-05 2.787 0.005325 **
alpha1 1.413e-01 3.935e-02 3.590 0.000331 ***
beta1 7.419e-01 7.420e-02 9.999 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
2756.02 normalized: 2.055198
Description:
Tue May 6 21:47:13 2025 by user:
Standardised Residuals Tests:
Statistic p-Value
Jarque-Bera Test R Chi^2 692.817441 0.0000000
Shapiro-Wilk Test R W 0.965332 0.0000000
Ljung-Box Test R Q(10) 3.163394 0.9773130
Ljung-Box Test R Q(15) 12.844822 0.6142825
Ljung-Box Test R Q(20) 23.423464 0.2684829
Ljung-Box Test R^2 Q(10) 2.206664 0.9944967
Ljung-Box Test R^2 Q(15) 6.986083 0.9580348
Ljung-Box Test R^2 Q(20) 8.589896 0.9871889
LM Arch Test R TR^2 2.689235 0.9973686
Information Criterion Statistics:
AIC BIC SIC HQIC
-4.104430 -4.088916 -4.104448 -4.098618
This model has the higher p-value from the Box-Ljung test, as well as significance for each coefficient in the model, making it the chosen model for this data. A p-value of 0.9482 indicates that there is no significant evidence of autocorrelation in residuals, meaning using the ARIMA residuals proved useful.
Title:
GARCH Modelling
Call:
fGarch::garchFit(formula = ~garch(2, 1), data = zoom_returns_df$Delt.1.arithmetic,
cond.dist = "norm", trace = FALSE)
Mean and Variance Equation:
data ~ garch(2, 1)
<environment: 0x7f8aabba49b0>
[data = zoom_returns_df$Delt.1.arithmetic]
Conditional Distribution:
norm
Coefficient(s):
mu omega alpha1 alpha2 beta1
-0.00046513 0.00004943 0.14661534 0.00000001 0.83143974
Std. Errors:
based on Hessian
Error Analysis:
Estimate Std. Error t value Pr(>|t|)
mu -4.651e-04 7.377e-04 -0.630 0.528
omega 4.943e-05 NaN NaN NaN
alpha1 1.466e-01 2.128e-02 6.889 5.6e-12 ***
alpha2 1.000e-08 NaN NaN NaN
beta1 8.314e-01 NaN NaN NaN
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Log Likelihood:
2702.637 normalized: 2.015389
Description:
Tue May 6 21:47:14 2025 by user:
Standardised Residuals Tests:
Statistic p-Value
Jarque-Bera Test R Chi^2 5020.5278002 0.0000000
Shapiro-Wilk Test R W 0.9250622 0.0000000
Ljung-Box Test R Q(10) 10.2111812 0.4221652
Ljung-Box Test R Q(15) 15.6764673 0.4038735
Ljung-Box Test R Q(20) 19.1373805 0.5129129
Ljung-Box Test R^2 Q(10) 5.6426543 0.8443395
Ljung-Box Test R^2 Q(15) 6.7425085 0.9644050
Ljung-Box Test R^2 Q(20) 9.6712697 0.9737486
LM Arch Test R TR^2 5.9684340 0.9176649
Information Criterion Statistics:
AIC BIC SIC HQIC
-4.023321 -4.003928 -4.023348 -4.016056
Code
zoom_returns <-na.omit(zoom_returns)zoom_ts <-ts(zoom_returns, start=decimal_date(as.Date("2020-01-01")), frequency =252)
Code
zoom_output <-ARIMA.c(0, 7, 0, 7, data = zoom_ts)zoom_output[which.min(zoom_output$AIC), ]
p d q AIC BIC AICc
metrics 0 0 0 -5147.438 -5137.036 -5147.429
Code
#auto.arima(zoom_ts)
Code
zoom_arima <-Arima(zoom_ts, order =c(0,1,0))zoom_arima_res <-residuals(zoom_arima)summary(zoom_arima)
Series: zoom_ts
ARIMA(0,1,0)
sigma^2 = 0.002527: log likelihood = 2105.59
AIC=-4209.18 AICc=-4209.17 BIC=-4203.98
Training set error measures:
ME RMSE MAE MPE MAPE MASE ACF1
Training set 1.56105e-05 0.05025495 0.03416894 NaN Inf 0.9690134 -0.4810518
The model chosen above is an ARIMA+GARCH model that uses ARIMA(2,0,3) and GARCH(1,1). Therefore, the mathematical representation of this is as follows:
Note: method of getting model equation is from Gamage, P. (2026). Applied Time Series for Data Science.
The model chosen above is an ARIMA+GARCH model that uses ARIMA(0,1,0) and GARCH(4,1). Therefore, the mathematical representation of this is as follows:
ARIMA:\[\phi(B)(1-B)x_t = \theta(B)w_t\] where \[\phi(B) = 1\] and \[\theta(B) = 1\]
Note: method of getting model equation is from Gamage, P. (2026). Applied Time Series for Data Science.
The model chosen above is an ARIMA+GARCH model that uses ARIMA(4,0,2) and GARCH(1,1). Therefore, the mathematical representation of this is as follows: